A Memory-Based Approach to Anti-Spam Filtering
نویسندگان
چکیده
This paper presents an extensive empirical evaluation of memory-based learning in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial e-mail, also known as “spam”, floods the mailboxes of users, causing frustration, wasting bandwidth and money, and exposing minors to unsuitable content. Using a recently introduced publicly available corpus, a thorough investigation of the effectiveness of a memory-based anti-spam filter is performed, including different attribute and distance weighting schemes, and studies on the effect of the neighborhood size, the size of the attribute set, and the size of the training corpus. Three different cost scenarios are identified, and suitable cost-sensitive evaluation functions are employed. We conclude that memory-based anti-spam filtering is practically feasible, especially when combined with additional safety nets. Compared to a previously tested Naïve Bays filter, the memory-based filter performs on average better, particularly when the misclassification cost for non-spam messages is high.
منابع مشابه
Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach
We investigate the performance of two machine learning algorithms in the context of antispam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an ...
متن کاملSPAM -- Technological and Legal Aspects
In this paper an attempt is made to review technological, economical and legal aspects of the spam in detail. The technical details will include different techniques of spam control e.g., filtering techniques, Genetic Algorithm, Memory Based Classifier, Support Vector Machine Method, etc. The economic aspect includes Shaping/Rate Throttling Approach/Economic Filtering and Pricing/Payment based ...
متن کاملA Case-Based Approach to Spam Filtering that Can Track Concept Drift
There are a few key benefits of a case-based approach to spam filtering. First, the many different sub-types of spam suggest that a local learner, such as Case-Based Reasoning (CBR) will perform well. Second, the lazy approach to learning in CBR allows for easy updating as new types of spam arrive. Third, the case-based approach to spam filtering allows for the sharing of cases and thus a shari...
متن کاملSurvey on Spam Filtering Techniques
In the recent years spam became as a big problem of Internet and electronic communication. There developed a lot of techniques to fight them. In this paper the overview of existing e-mail spam filtering methods is given. The classification, evaluation, and comparison of traditional and learning-based methods are provided. Some personal anti-spam products are tested and compared. The statement f...
متن کاملEvolutionary Symbiotic Feature Selection for Email Spam Detection
This work presents a symbiotic filtering approach enabling the exchange of relevant word features among different users in order to improve local anti-spam filters. The local spam filtering is based on a ContentBased Filtering strategy, where word frequencies are fed into a Naive Bayes learner. Several Evolutionary Algorithms are explored for feature selection, including the proposed symbiotic ...
متن کامل